Structured Orthogonal Inversion of Block p-Cyclic Matrices on Multicores with GPU Accelerators

نویسندگان

  • Sergiy Gogolenko
  • Zhaojun Bai
  • Richard Scalettar
چکیده

We present a block structured orthogonal factorization (BSOF) algorithm and its parallelization for computing the inversion of block p-cyclic matrices. We aim at the high performance on multicores with GPU accelerators. We provide a quantitative performance model for optimal host-device load balance, and validate the model through numerical tests. Benchmarking results show that the parallel BSOF based inversion algorithm attains up to 90% of DGEMM performance on hybrid CPU+GPU systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Structured Orthogonal Inversion of Block p-Cyclic Matrices on Multicore with GPU

We present a block structured orthogonal factorization (BSOF) algorithm and its parallelization for computing the inversion of block pcyclic matrices. We aim at the high performance on multicores with GPU accelerators. We provide a quantitative performance model for optimal host-device load balance, and validate the model through numerical tests. Benchmarking results show that the parallel BSOF...

متن کامل

Structured Condition Numbers for Invariant Subspaces

Invariant subspaces of structured matrices are sometimes better conditioned with respect to structured perturbations than with respect to general perturbations. Sometimes they are not. This paper proposes an appropriate condition number cS, for invariant subspaces subject to structured perturbations. Several examples compare cS with the unstructured condition number. The examples include block ...

متن کامل

GPGPU parallel algorithms for structured-grid CFD codes

A new high-performance general-purpose graphics processing unit (GPGPU) computational fluid dynamics (CFD) library is introduced for use with structured-grid CFD algorithms. A novel set of parallel tridiagonal matrix solvers, implemented in CUDA, is included for use with structured-grid CFD algorithms. The solver library supports both scalar and block-tridiagonal matrices suitable for approxima...

متن کامل

A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators

We present a Cholesky factorization for multicore with GPU accelerators systems. The challenges in developing scalable high performance algorithms for these emerging systems stem from their heterogeneity, massive parallelism, and the huge gap between the GPUs’ compute power vs the CPU-GPU communication speed. We show an approach that is largely based on software infrastructures that have alread...

متن کامل

Hybrid Multicore Cholesky Factorization with Multiple GPU Accelerators

We present a Cholesky factorization for multicore with GPU accelerators. The challenges in developing scalable high performance algorithms for these emerging systems stem from their heterogeneity, massive parallelism, and the huge gap between the GPUs’ compute power vs the CPU-GPU communication speed. We show an approach that is largely based on software infrastructures that have already been d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014